Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 57
Filtrar
1.
Wiley Interdiscip Rev RNA ; 15(2): e1838, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38509732

RESUMO

Disruptions in spatiotemporal gene expression can result in atypical brain function. Specifically, autism spectrum disorder (ASD) is characterized by abnormalities in pre-mRNA splicing. Abnormal splicing patterns have been identified in the brains of individuals with ASD, and mutations in splicing factors have been found to contribute to neurodevelopmental delays associated with ASD. Here we review studies that shed light on the importance of splicing observed in ASD and that explored the intricate relationship between splicing factors and ASD, revealing how disruptions in pre-mRNA splicing may underlie ASD pathogenesis. We provide an overview of the research regarding all splicing factors associated with ASD and place a special emphasis on five specific splicing factors-HNRNPH2, NOVA2, WBP4, SRRM2, and RBFOX1-known to impact the splicing of ASD-related genes. In the discussion of the molecular mechanisms influenced by these splicing factors, we lay the groundwork for a deeper understanding of ASD's complex etiology. Finally, we discuss the potential benefit of unraveling the connection between splicing and ASD for the development of more precise diagnostic tools and targeted therapeutic interventions. This article is categorized under: RNA in Disease and Development > RNA in Disease RNA Evolution and Genomics > RNA and Ribonucleoprotein Evolution RNA Evolution and Genomics > Computational Analyses of RNA RNA-Based Catalysis > RNA Catalysis in Splicing and Translation.


Assuntos
Transtorno do Espectro Autista , Transtorno Autístico , Humanos , Transtorno do Espectro Autista/genética , Transtorno do Espectro Autista/metabolismo , Transtorno Autístico/genética , Precursores de RNA/genética , Precursores de RNA/metabolismo , Splicing de RNA/genética , Fatores de Processamento de RNA/metabolismo , Antígeno Neuro-Oncológico Ventral
2.
Nat Microbiol ; 9(3): 595-613, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38347104

RESUMO

Microbial breakdown of organic matter is one of the most important processes on Earth, yet the controls of decomposition are poorly understood. Here we track 36 terrestrial human cadavers in three locations and show that a phylogenetically distinct, interdomain microbial network assembles during decomposition despite selection effects of location, climate and season. We generated a metagenome-assembled genome library from cadaver-associated soils and integrated it with metabolomics data to identify links between taxonomy and function. This universal network of microbial decomposers is characterized by cross-feeding to metabolize labile decomposition products. The key bacterial and fungal decomposers are rare across non-decomposition environments and appear unique to the breakdown of terrestrial decaying flesh, including humans, swine, mice and cattle, with insects as likely important vectors for dispersal. The observed lockstep of microbial interactions further underlies a robust microbial forensic tool with the potential to aid predictions of the time since death.


Assuntos
Consórcios Microbianos , Microbiologia do Solo , Camundongos , Humanos , Animais , Suínos , Bovinos , Cadáver , Metagenoma , Bactérias
3.
Cell Rep Med ; 4(12): 101313, 2023 12 19.
Artigo em Inglês | MEDLINE | ID: mdl-38118424

RESUMO

Identification of the gene expression state of a cancer patient from routine pathology imaging and characterization of its phenotypic effects have significant clinical and therapeutic implications. However, prediction of expression of individual genes from whole slide images (WSIs) is challenging due to co-dependent or correlated expression of multiple genes. Here, we use a purely data-driven approach to first identify groups of genes with co-dependent expression and then predict their status from WSIs using a bespoke graph neural network. These gene groups allow us to capture the gene expression state of a patient with a small number of binary variables that are biologically meaningful and carry histopathological insights for clinical and therapeutic use cases. Prediction of gene expression state based on these gene groups allows associating histological phenotypes (cellular composition, mitotic counts, grading, etc.) with underlying gene expression patterns and opens avenues for gaining biological insights from routine pathology imaging directly.


Assuntos
Neoplasias da Mama , Perfilação da Expressão Gênica , Humanos , Feminino , Transcriptoma/genética , Redes Neurais de Computação , Fenótipo , Neoplasias da Mama/genética
4.
Front Bioinform ; 3: 1198218, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37915563

RESUMO

Motivation: The prediction of a protein 3D structure is essential for understanding protein function, drug discovery, and disease mechanisms; with the advent of methods like AlphaFold that are capable of producing very high-quality decoys, ensuring the quality of those decoys can provide further confidence in the accuracy of their predictions. Results: In this work, we describe Qϵ, a graph convolutional network (GCN) that utilizes a minimal set of atom and residue features as inputs to predict the global distance test total score (GDTTS) and local distance difference test (lDDT) score of a decoy. To improve the model's performance, we introduce a novel loss function based on the ϵ-insensitive loss function used for SVM regression. This loss function is specifically designed for evaluating the characteristics of the quality assessment problem and provides predictions with improved accuracy over standard loss functions used for this task. Despite using only a minimal set of features, it matches the performance of recent state-of-the-art methods like DeepUMQA. Availability: The code for Qϵ is available at https://github.com/soumyadip1997/qepsilon.

5.
Genome Biol ; 24(1): 53, 2023 03 22.
Artigo em Inglês | MEDLINE | ID: mdl-36949544

RESUMO

BACKGROUND: Alternative splicing is a widespread regulatory phenomenon that enables a single gene to produce multiple transcripts. Among the different types of alternative splicing, intron retention is one of the least explored despite its high prevalence in both plants and animals. The recent discovery that the majority of splicing is co-transcriptional has led to the finding that chromatin state affects alternative splicing. Therefore, it is plausible that transcription factors can regulate splicing outcomes. RESULTS: We provide evidence for the hypothesis that transcription factors are involved in the regulation of intron retention by studying regions of open chromatin in retained and excised introns. Using deep learning models designed to distinguish between regions of open chromatin in retained introns and non-retained introns, we identified motifs enriched in IR events with significant hits to known human transcription factors. Our model predicts that the majority of transcription factors that affect intron retention come from the zinc finger family. We demonstrate the validity of these predictions using ChIP-seq data for multiple zinc finger transcription factors and find strong over-representation for their peaks in intron retention events. CONCLUSIONS: This work opens up opportunities for further studies that elucidate the mechanisms by which transcription factors affect intron retention and other forms of splicing. AVAILABILITY: Source code available at https://github.com/fahadahaf/chromir.


Assuntos
Processamento Alternativo , Fatores de Transcrição , Animais , Humanos , Íntrons , Fatores de Transcrição/genética , Splicing de RNA , Cromatina/genética
6.
Bioinformatics ; 38(Suppl_2): ii75-ii81, 2022 09 16.
Artigo em Inglês | MEDLINE | ID: mdl-36124806

RESUMO

MOTIVATION: Machine-learning-based prediction of compound-protein interactions (CPIs) is important for drug design, screening and repurposing. Despite numerous recent publication with increasing methodological sophistication claiming consistent improvements in predictive accuracy, we have observed a number of fundamental issues in experiment design that produce overoptimistic estimates of model performance. RESULTS: We systematically analyze the impact of several factors affecting generalization performance of CPI predictors that are overlooked in existing work: (i) similarity between training and test examples in cross-validation; (ii) synthesizing negative examples in absence of experimentally verified negative examples and (iii) alignment of evaluation protocol and performance metrics with real-world use of CPI predictors in screening large compound libraries. Using both state-of-the-art approaches by other researchers as well as a simple kernel-based baseline, we have found that effective assessment of generalization performance of CPI predictors requires careful control over similarity between training and test examples. We show that, under stringent performance assessment protocols, a simple kernel-based approach can exceed the predictive performance of existing state-of-the-art methods. We also show that random pairing for generating synthetic negative examples for training and performance evaluation results in models with better generalization in comparison to more sophisticated strategies used in existing studies. Our analyses indicate that using proposed experiment design strategies can offer significant improvements for CPI prediction leading to effective target compound screening for drug repurposing and discovery of putative chemical ligands of SARS-CoV-2-Spike and Human-ACE2 proteins. AVAILABILITY AND IMPLEMENTATION: Code and supplementary material available at https://github.com/adibayaseen/HKRCPI. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Enzima de Conversão de Angiotensina 2 , Aprendizado de Máquina , Humanos , Ligantes , SARS-CoV-2
7.
BMC Bioinformatics ; 23(1): 142, 2022 Apr 20.
Artigo em Inglês | MEDLINE | ID: mdl-35443610

RESUMO

BACKGROUND: Despite recent progress in basecalling of Oxford nanopore DNA sequencing data, its wide adoption is still being hampered by its relatively low accuracy compared to short read technologies. Furthermore, very little of the recent research was focused on basecalling of RNA data, which has different characteristics than its DNA counterpart. RESULTS: We fill this gap by benchmarking a fully convolutional deep learning basecalling architecture with improved performance compared to Oxford nanopore's RNA basecallers. AVAILABILITY: The source code for our basecaller is available at: https://github.com/biodlab/RODAN .


Assuntos
Sequenciamento por Nanoporos , Nanoporos , DNA , Sequenciamento de Nucleotídeos em Larga Escala , RNA , Análise de Sequência de DNA , Análise de Sequência de RNA
8.
Front Bioinform ; 2: 1083292, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36591335

RESUMO

As practitioners of machine learning in the area of bioinformatics we know that the quality of the results crucially depends on the quality of our labeled data. While there is a tendency to focus on the quality of positive examples, the negative examples are equally as important. In this opinion paper we revisit the problem of choosing negative examples for the task of predicting protein-protein interactions, either among proteins of a given species or for host-pathogen interactions and describe important issues that are prevalent in the current literature. The challenge in creating datasets for this task is the noisy nature of the experimentally derived interactions and the lack of information on non-interacting proteins. A standard approach is to choose random pairs of non-interacting proteins as negative examples. Since the interactomes of all species are only partially known, this leads to a very small percentage of false negatives. This is especially true for host-pathogen interactions. To address this perceived issue, some researchers have chosen to select negative examples as pairs of proteins whose sequence similarity to the positive examples is sufficiently low. This clearly reduces the chance for false negatives, but also makes the problem much easier than it really is, leading to over-optimistic accuracy estimates. We demonstrate the effect of this form of bias using a selection of recent protein interaction prediction methods of varying complexity, and urge researchers to pay attention to the details of generating their datasets for potential biases like this.

9.
Front Microbiol ; 12: 681150, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34054788

RESUMO

Histone proteins compact and organize DNA resulting in a dynamic chromatin architecture impacting DNA accessibility and ultimately gene expression. Eukaryotic chromatin landscapes are structured through histone protein variants, epigenetic marks, the activities of chromatin-remodeling complexes, and post-translational modification of histone proteins. In most Archaea, histone-based chromatin structure is dominated by the helical polymerization of histone proteins wrapping DNA into a repetitive and closely gyred configuration. The formation of the archaeal-histone chromatin-superhelix is a regulatory force of adaptive gene expression and is likely critical for regulation of gene expression in all histone-encoding Archaea. Single amino acid substitutions in archaeal histones that block formation of tightly packed chromatin structures have profound effects on cellular fitness, but the underlying gene expression changes resultant from an altered chromatin landscape have not been resolved. Using the model organism Thermococcus kodakarensis, we genetically alter the chromatin landscape and quantify the resultant changes in gene expression, including unanticipated and significant impacts on provirus transcription. Global transcriptome changes resultant from varying chromatin landscapes reveal the regulatory importance of higher-order histone-based chromatin architectures in regulating archaeal gene expression.

10.
Nucleic Acids Res ; 49(13): e77, 2021 07 21.
Artigo em Inglês | MEDLINE | ID: mdl-33950192

RESUMO

Deep learning has demonstrated its predictive power in modeling complex biological phenomena such as gene expression. The value of these models hinges not only on their accuracy, but also on the ability to extract biologically relevant information from the trained models. While there has been much recent work on developing feature attribution methods that discover the most important features for a given sequence, inferring cooperativity between regulatory elements, which is the hallmark of phenomena such as gene expression, remains an open problem. We present SATORI, a Self-ATtentiOn based model to detect Regulatory element Interactions. Our approach combines convolutional layers with a self-attention mechanism that helps us capture a global view of the landscape of interactions between regulatory elements in a sequence. A comprehensive evaluation demonstrates the ability of SATORI to identify numerous statistically significant TF-TF interactions, many of which have been previously reported. Our method is able to detect higher numbers of experimentally verified TF-TF interactions than existing methods, and has the advantage of not requiring a computationally expensive post-processing step. Finally, SATORI can be used for detection of any type of feature interaction in models that use a similar attention mechanism, and is not limited to the detection of TF-TF interactions.


Assuntos
Aprendizado Profundo , Genômica/métodos , Elementos Reguladores de Transcrição , Fatores de Transcrição/metabolismo , Arabidopsis/genética , Linhagem Celular , Sequenciamento de Cromatina por Imunoprecipitação , Humanos , Motivos de Nucleotídeos , Regiões Promotoras Genéticas
11.
Biochem Soc Trans ; 48(6): 2399-2414, 2020 12 18.
Artigo em Inglês | MEDLINE | ID: mdl-33196096

RESUMO

Next-generation sequencing (NGS) technologies - Illumina RNA-seq, Pacific Biosciences isoform sequencing (PacBio Iso-seq), and Oxford Nanopore direct RNA sequencing (DRS) - have revealed the complexity of plant transcriptomes and their regulation at the co-/post-transcriptional level. Global analysis of mature mRNAs, transcripts from nuclear run-on assays, and nascent chromatin-bound mRNAs using short as well as full-length and single-molecule DRS reads have uncovered potential roles of different forms of RNA polymerase II during the transcription process, and the extent of co-transcriptional pre-mRNA splicing and polyadenylation. These tools have also allowed mapping of transcriptome-wide start sites in cap-containing RNAs, poly(A) site choice, poly(A) tail length, and RNA base modifications. The emerging theme from recent studies is that reprogramming of gene expression in response to developmental cues and stresses at the co-/post-transcriptional level likely plays a crucial role in eliciting appropriate responses for optimal growth and plant survival under adverse conditions. Although the mechanisms by which developmental cues and different stresses regulate co-/post-transcriptional splicing are largely unknown, a few recent studies indicate that the external cues target spliceosomal and splicing regulatory proteins to modulate alternative splicing. In this review, we provide an overview of recent discoveries on the dynamics and complexities of plant transcriptomes, mechanistic insights into splicing regulation, and discuss critical gaps in co-/post-transcriptional research that need to be addressed using diverse genomic and biochemical approaches.


Assuntos
Proteínas de Plantas/metabolismo , Transcriptoma , Processamento Alternativo , Arabidopsis/genética , Sequência de Bases , Cromatina/química , Cromatina/metabolismo , Perfilação da Expressão Gênica , Genes de Plantas , Proteínas de Fluorescência Verde/metabolismo , Sequenciamento de Nucleotídeos em Larga Escala , Isoformas de Proteínas , Processamento Pós-Transcricional do RNA , Splicing de RNA , RNA-Seq , Análise de Sequência de RNA
12.
Genes (Basel) ; 11(8)2020 08 03.
Artigo em Inglês | MEDLINE | ID: mdl-32756364

RESUMO

Breast cancer is the second leading cause of death in women above 60 years in the US. Screening mammography is recommended for women above 50 years; however, 22% of breast cancer cases are diagnosed in women below this age. We set out to develop a test based on the detection of cell-free RNA from saliva. To this end, we sequenced RNA from a pool of ten women. The 1254 transcripts identified were enriched for genes with an annotation of alternative pre-mRNA splicing. Pre-mRNA splicing is a tightly regulated process and its misregulation in cancer cells promotes the formation of cancer-driving isoforms. For these reasons, we chose to focus on splicing factors as biomarkers for the early detection of breast cancer. We found that the level of the splicing factors is unique to each woman and consistent in the same woman at different time points. Next, we extracted RNA from 36 healthy subjects and 31 breast cancer patients. Recording the mRNA level of seven splicing factors in these samples demonstrated that the combination of all these factors is different in the two groups (p value = 0.005). Our results demonstrate a differential abundance of splicing factor mRNA in the saliva of breast cancer patients.


Assuntos
Biomarcadores Tumorais/genética , Neoplasias da Mama/diagnóstico , Fatores de Processamento de RNA/genética , RNA Mensageiro/genética , Saliva/metabolismo , Adulto , Idoso , Biomarcadores Tumorais/metabolismo , Neoplasias da Mama/genética , Neoplasias da Mama/metabolismo , Feminino , Humanos , Pessoa de Meia-Idade , Fatores de Processamento de RNA/metabolismo , RNA Mensageiro/metabolismo
13.
Sci Rep ; 10(1): 6047, 2020 04 08.
Artigo em Inglês | MEDLINE | ID: mdl-32269234

RESUMO

Efforts to develop effective and safe drugs for treatment of tuberculosis require preclinical evaluation in animal models. Alongside efficacy testing of novel therapies, effects on pulmonary pathology and disease progression are monitored by using histopathology images from these infected animals. To compare the severity of disease across treatment cohorts, pathologists have historically assigned a semi-quantitative histopathology score that may be subjective in terms of their training, experience, and personal bias. Manual histopathology therefore has limitations regarding reproducibility between studies and pathologists, potentially masking successful treatments. This report describes a pathologist-assistive software tool that reduces these user limitations, while providing a rapid, quantitative scoring system for digital histopathology image analysis. The software, called 'Lesion Image Recognition and Analysis' (LIRA), employs convolutional neural networks to classify seven different pathology features, including three different lesion types from pulmonary tissues of the C3HeB/FeJ tuberculosis mouse model. LIRA was developed to improve the efficiency of histopathology analysis for mouse tuberculosis infection models, this approach has also broader applications to other disease models and tissues. The full source code and documentation is available from https://Github.com/TB-imaging/LIRA.


Assuntos
Processamento de Imagem Assistida por Computador/métodos , Pulmão/diagnóstico por imagem , Mycobacterium tuberculosis/fisiologia , Tuberculose Pulmonar/diagnóstico por imagem , Algoritmos , Animais , Modelos Animais de Doenças , Humanos , Pulmão/patologia , Camundongos , Camundongos Endogâmicos C3H , Redes Neurais de Computação , Software , Tuberculose Pulmonar/patologia
14.
Int J Mol Sci ; 21(3)2020 Jan 24.
Artigo em Inglês | MEDLINE | ID: mdl-31991584

RESUMO

Drought is a major limiting factor of crop yields. In response to drought, plants reprogram their gene expression, which ultimately regulates a multitude of biochemical and physiological processes. The timing of this reprogramming and the nature of the drought-regulated genes in different genotypes are thought to confer differential tolerance to drought stress. Sorghum is a highly drought-tolerant crop and has been increasingly used as a model cereal to identify genes that confer tolerance. Also, there is considerable natural variation in resistance to drought in different sorghum genotypes. Here, we evaluated drought resistance in four genotypes to polyethylene glycol (PEG)-induced drought stress at the seedling stage and performed transcriptome analysis in seedlings of sorghum genotypes that are either drought-resistant or drought-sensitive to identify drought-regulated changes in gene expression that are unique to drought-resistant genotypes of sorghum. Our analysis revealed that about 180 genes are differentially regulated in response to drought stress only in drought-resistant genotypes and most of these (over 70%) are up-regulated in response to drought. Among these, about 70 genes are novel with no known function and the remaining are transcription factors, signaling and stress-related proteins implicated in drought tolerance in other crops. This study revealed a set of drought-regulated genes, including many genes encoding uncharacterized proteins that are associated with drought tolerance at the seedling stage.


Assuntos
Perfilação da Expressão Gênica , Regulação da Expressão Gênica de Plantas/efeitos dos fármacos , Genótipo , Polietilenoglicóis/farmacologia , Sorghum/metabolismo , Transcrição Gênica/efeitos dos fármacos , Transcriptoma/efeitos dos fármacos , Desidratação/genética , Desidratação/metabolismo , Sorghum/genética
15.
Bioinformatics ; 35(14): i269-i277, 2019 07 15.
Artigo em Inglês | MEDLINE | ID: mdl-31510640

RESUMO

MOTIVATION: Deep learning architectures have recently demonstrated their power in predicting DNA- and RNA-binding specificity. Existing methods fall into three classes: Some are based on convolutional neural networks (CNNs), others use recurrent neural networks (RNNs) and others rely on hybrid architectures combining CNNs and RNNs. However, based on existing studies the relative merit of the various architectures remains unclear. RESULTS: In this study we present a systematic exploration of deep learning architectures for predicting DNA- and RNA-binding specificity. For this purpose, we present deepRAM, an end-to-end deep learning tool that provides an implementation of a wide selection of architectures; its fully automatic model selection procedure allows us to perform a fair and unbiased comparison of deep learning architectures. We find that deeper more complex architectures provide a clear advantage with sufficient training data, and that hybrid CNN/RNN architectures outperform other methods in terms of accuracy. Our work provides guidelines that can assist the practitioner in choosing an appropriate network architecture, and provides insight on the difference between the models learned by convolutional and recurrent networks. In particular, we find that although recurrent networks improve model accuracy, this comes at the expense of a loss in the interpretability of the features learned by the model. AVAILABILITY AND IMPLEMENTATION: The source code for deepRAM is available at https://github.com/MedChaabane/deepRAM. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Aprendizado Profundo , Redes Neurais de Computação , Sequência de Bases , DNA , RNA , Sensibilidade e Especificidade
16.
Plant Dis ; 103(11): 2893-2902, 2019 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-31436473

RESUMO

Uniqprimer, a software pipeline developed in Python, was deployed as a user-friendly internet tool in Rice Galaxy for comparative genome analyses to design primer sets for PCRassays capable of detecting target bacterial taxa. The pipeline was trialed with Dickeya dianthicola, a destructive broad-host-range bacterial pathogen found in most potato-growing regions. Dickeya is a highly variable genus, and some primers available to detect this genus and species exhibit common diagnostic failures. Upon uploading a selection of target and nontarget genomes, six primer sets were rapidly identified with Uniqprimer, of which two were specific and sensitive when tested with D. dianthicola. The remaining four amplified a minority of the nontarget strains tested. The two promising candidate primer sets were trialed with DNA isolated from 116 field samples from across the United States that were previously submitted for testing. D. dianthicola was detected in 41 samples, demonstrating the applicability of our detection primers and suggesting widespread occurrence of D. dianthicola in North America.


Assuntos
Agricultura , Técnicas Bacteriológicas , Primers do DNA , Enterobacteriaceae , Solanum tuberosum , Agricultura/métodos , Técnicas Bacteriológicas/métodos , Primers do DNA/genética , Enterobacteriaceae/genética , América do Norte , Doenças das Plantas/microbiologia , Solanum tuberosum/microbiologia
17.
BMC Bioinformatics ; 19(1): 425, 2018 Nov 15.
Artigo em Inglês | MEDLINE | ID: mdl-30442086

RESUMO

BACKGROUND: Determining protein-protein interactions and their binding affinity are important in understanding cellular biological processes, discovery and design of novel therapeutics, protein engineering, and mutagenesis studies. Due to the time and effort required in wet lab experiments, computational prediction of binding affinity from sequence or structure is an important area of research. Structure-based methods, though more accurate than sequence-based techniques, are limited in their applicability due to limited availability of protein structure data. RESULTS: In this study, we propose a novel machine learning method for predicting binding affinity that uses protein 3D structure as privileged information at training time while expecting only protein sequence information during testing. Using the method, which is based on the framework of learning using privileged information (LUPI), we have achieved improved performance over corresponding sequence-based binding affinity prediction methods that do not have access to privileged information during training. Our experiments show that with the proposed framework which uses structure only during training, it is possible to achieve classification performance comparable to that which is obtained using structure-based features. Evaluation on an independent test set shows improved performance over the PPA-Pred2 method as well. CONCLUSIONS: The proposed method outperforms several baseline learners and a state-of-the-art binding affinity predictor not only in cross-validation, but also on an additional validation dataset, demonstrating the utility of the LUPI framework for problems that would benefit from classification using structure-based features. The implementation of LUPI developed for this work is expected to be useful in other areas of bioinformatics as well.


Assuntos
Algoritmos , Biologia Computacional/métodos , Aprendizado de Máquina , Proteínas/metabolismo , Sequência de Aminoácidos , Ligantes , Ligação Proteica , Proteínas/química , Curva ROC , Reprodutibilidade dos Testes , Máquina de Vetores de Suporte
18.
Front Plant Sci ; 9: 5, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-29483921

RESUMO

Abiotic stresses affect plant physiology, development, growth, and alter pre-mRNA splicing. Western poplar is a model woody tree and a potential bioenergy feedstock. To investigate the extent of stress-regulated alternative splicing (AS), we conducted an in-depth survey of leaf, root, and stem xylem transcriptomes under drought, salt, or temperature stress. Analysis of approximately one billion of genome-aligned RNA-Seq reads from tissue- or stress-specific libraries revealed over fifteen millions of novel splice junctions. Transcript models supported by both RNA-Seq and single molecule isoform sequencing (Iso-Seq) data revealed a broad array of novel stress- and/or tissue-specific isoforms. Analysis of Iso-Seq data also resulted in the discovery of 15,087 novel transcribed regions of which 164 show AS. Our findings demonstrate that abiotic stresses profoundly perturb transcript isoform profiles and trigger widespread intron retention (IR) events. Stress treatments often increased or decreased retention of specific introns - a phenomenon described here as differential intron retention (DIR). Many differentially retained introns were regulated in a stress- and/or tissue-specific manner. A subset of transcripts harboring super stress-responsive DIR events showed persisting fluctuations in the degree of IR across all treatments and tissue types. To investigate coordinated dynamics of intron-containing transcripts in the study we quantified absolute copy number of isoforms of two conserved transcription factors (TFs) using Droplet Digital PCR. This case study suggests that stress treatments can be associated with coordinated switches in relative ratios between fully spliced and intron-retaining isoforms and may play a role in adjusting transcriptome to abiotic stresses.

19.
BMC Genomics ; 19(1): 21, 2018 01 05.
Artigo em Inglês | MEDLINE | ID: mdl-29304739

RESUMO

BACKGROUND: Intron retention (IR) is the most prevalent form of alternative splicing in plants. IR, like other forms of alternative splicing, has an important role in increasing gene product diversity and regulating transcript functionality. Splicing is known to occur co-transcriptionally and is influenced by the speed of transcription which in turn, is affected by chromatin structure. It follows that chromatin structure may have an important role in the regulation of splicing, and there is preliminary evidence in metazoans to suggest that this is indeed the case; however, nothing is known about the role of chromatin structure in regulating IR in plants. DNase I-seq is a useful experimental tool for genome-wide interrogation of chromatin accessibility, providing information on regions of chromatin with very high likelihood of cleavage by the enzyme DNase I, known as DNase I Hypersensitive Sites (DHSs). While it is well-established that promoter regions are highly accessible and are over-represented with DHSs, not much is known about DHSs in the bodies of genes, and their relationship to splicing in general, and IR in particular. RESULTS: In this study we use publicly available DNase I-seq data in arabidopsis and rice to investigate the relationship between IR and chromatin structure. We find that IR events are highly enriched in DHSs in both species. This implies that chromatin is more open in retained introns, which is consistent with a kinetic model of the process whereby higher speeds of transcription in those regions give less time for the spliceosomal machinery to recognize and splice out those introns co-transcriptionally. The more open chromatin in IR can also be the result of regulation mediated by DNA-binding proteins. To test this, we performed an exhaustive search for footprints left by DNA-binding proteins that are associated with IR. We identified several hundred short sequence elements that exhibit footprints in their DNase I-seq coverage, the telltale sign for binding events of a regulatory protein, protecting its binding site from cleavage by DNase I. A highly significant fraction of those sequence elements are conserved between arabidopsis and rice, a strong indication of their functional importance. CONCLUSIONS: In this study we have established an association between IR and chromatin accessibility, and presented a mechanistic hypothesis that explains the observed association from the perspective of the co-transcriptional nature of splicing. Furthermore, we identified conserved sequence elements for DNA-binding proteins that affect splicing.


Assuntos
Arabidopsis/genética , Cromatina/química , Íntrons , Oryza/genética , Processamento Alternativo , Cromatina/metabolismo , Proteínas de Ligação a DNA/metabolismo , Desoxirribonuclease I , Pegadas de Proteínas
20.
PLoS Comput Biol ; 13(4): e1005465, 2017 04.
Artigo em Inglês | MEDLINE | ID: mdl-28394888

RESUMO

Many prion-forming proteins contain glutamine/asparagine (Q/N) rich domains, and there are conflicting opinions as to the role of primary sequence in their conversion to the prion form: is this phenomenon driven primarily by amino acid composition, or, as a recent computational analysis suggested, dependent on the presence of short sequence elements with high amyloid-forming potential. The argument for the importance of short sequence elements hinged on the relatively-high accuracy obtained using a method that utilizes a collection of length-six sequence elements with known amyloid-forming potential. We weigh in on this question and demonstrate that when those sequence elements are permuted, even higher accuracy is obtained; we also propose a novel multiple-instance machine learning method that uses sequence composition alone, and achieves better accuracy than all existing prion prediction approaches. While we expect there to be elements of primary sequence that affect the process, our experiments suggest that sequence composition alone is sufficient for predicting protein sequences that are likely to form prions. A web-server for the proposed method is available at http://faculty.pieas.edu.pk/fayyaz/prank.html, and the code for reproducing our experiments is available at http://doi.org/10.5281/zenodo.167136.


Assuntos
Sequência de Aminoácidos , Asparagina/química , Biologia Computacional/métodos , Glutamina/química , Aprendizado de Máquina , Príons/química , Amiloide/química , Humanos , Príons/metabolismo , Leveduras
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA